Skip to content

Claude/unified query planner a w8ax#65

Merged
AdaWorldAPI merged 7 commits into
masterfrom
claude/unified-query-planner-aW8ax
Mar 30, 2026
Merged

Claude/unified query planner a w8ax#65
AdaWorldAPI merged 7 commits into
masterfrom
claude/unified-query-planner-aW8ax

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

No description provided.

claude added 7 commits March 30, 2026 19:23
F32x8 and F64x4 only had Add + Mul. AVX2 fallback for F32x16 needs
all four arithmetic ops on the 256-bit types.

Additive only — no existing code changed.

https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
F32x16, F64x8, U8x64, I32x16, I64x8, U32x16, U64x8 — all composed
from 2× AVX2 halves (F32x8/F64x4 for float, array loops for integer).

Same API as simd_avx512.rs types. simd.rs will LazyLock-dispatch
between the two files based on runtime CPU detection.

Add/Sub/Mul/Div on F32x16 dispatch to 2× F32x8 operations (AVX2).
Integer types use array loops (AVX2 lacks 512-bit integer SIMD).

https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
simd.rs now re-exports from simd_avx2 (2× __m256 composed types)
instead of simd_avx512 (__m512 native) for all 512-bit types.

This eliminates the SIGILL risk on x86_64 without AVX-512.
The AVX2 composed types use 2× F32x8 per F32x16 operation —
correct on all hardware, 2 instructions instead of 1 on AVX-512.

BLAS hot paths (dot, axpy, gemm) still dispatch to AVX-512
kernels via native.rs LazyLock<Tier> — no performance regression
for inner loops. The simd.rs types serve HPC consumer code.

LazyLock<Tier> detection added to simd.rs (same pattern as native.rs).
F32x8/F64x4 (256-bit AVX2 base types) always re-exported from simd_avx512.

1422/1423 tests pass (1 pre-existing causal_diff failure).

https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
- Remove burn from exclude list — all crates now in workspace
- Add [lib] section to burn Cargo.toml (edition 2024 requires explicit target)
- p64: 23 tests pass, phyllotactic-manifold: 14 tests pass
- Full workspace compiles clean

https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
7 sections, 11 tests, zero new types — only mappings:

1. SIMD manifold: expand_manifold_simd() via F64x8 + SPIRAL7_X/Y
2. SIMD attention: attend_batch_8() with VPOPCNTDQ fast path via simd_caps()
3. NARS bridge: resonance_to_nars(), nars_to_branch_byte()
4. CausalEdge64 compat: bit layout, palette addressing, layer mask mapping
5. ThinkingStyle cache: 6 styles in LazyLock, ordinal + name lookup
6. Semiring mapping: semiring name → CombineMode + ContraMode
7. DeepNSM palette: distance matrix → Palette64 interaction bitmap

Re-exports: Palette64, Palette3D, ThinkingStyle, HeelPlanes,
CombineMode, ContraMode, predicate, manifold_consts

p64 + phyllotactic-manifold added as path deps in Cargo.toml.

https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
@AdaWorldAPI AdaWorldAPI merged commit 9bd8cc7 into master Mar 30, 2026
5 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants